Search CORE

22 research outputs found

Evaluation-as-a-service for the computational sciences: overview and outlook

Author: Balog K.
Brodt T.
Cormack G.V.
Eggel I.
Gollub T.
Hanbury A.
Hopfgartner F.
Kalpathy-Cramer J.
Kando N.
Kato M.P.
Krithara A.
Lin J.
Mercer S.
Muller H.
Potthast M.
Viegas E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/11/2018
Field of study

Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfield paradigm of creating shared test collections, defining search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not fit this paradigm very well: extremely large data sets, confidential data sets as found in the medical domain, and rapidly changing data sets as often encountered in industry. Crowdsourcing has also changed the way in which industry approaches problem-solving with companies now organizing challenges and handing out monetary awards to incentivize people to work on their challenges, particularly in the field of machine learning. This article is based on discussions at a workshop on Evaluation-as-a-Service (EaaS). EaaS is the paradigm of not providing data sets to participants and have them work on the data locally, but keeping the data central and allowing access via Application Programming Interfaces (API), Virtual Machines (VM), or other possibilities to ship executables. The objectives of this article are to summarize and compare the current approaches and consolidate the experiences of these approaches to outline the next steps of EaaS, particularly toward sustainable research infrastructures. The article summarizes several existing approaches to EaaS and analyzes their usage scenarios and also the advantages and disadvantages. The many factors influencing EaaS are summarized, and the environment in terms of motivations for the various stakeholders, from funding agencies to challenge organizers, researchers and participants, to industry interested in supplying real-world problems for which they require solutions. EaaS solves many problems of the current research environment, where data sets are often not accessible to many researchers. Executables of published tools are equally often not available making the reproducibility of results impossible. EaaS, however, creates reusable/citable data sets as well as available executables. Many challenges remain, but such a framework for research can also foster more collaboration between researchers, potentially increasing the speed of obtaining research results

White Rose Research Online

Dynamical complexity of short and noisy time series: Compression-Complexity vs. Shannon entropy

Author: A. Golan
A. Lempel
A. Li
A. Porta
B.H. Juang
B.J. Strait
C. Tsallis
C.E. Shannon
D. Coast
F.O. Mettle
G.V. Cormack
H. Veisi
H. Zhou
H.H. Otu
H.S. Wang
I. Sergienko
J. Hu
J. Ziv
J.M. Amigó
K. Balasubramanian
Karthi Balasubramanian
L. Narlikar
M. Aboy
M. Borowska
N. Nagaraj
Nithin Nagaraj
P. Fiedor
R. Giglio
R. Zhou
R.P. Rao
R.P. Rao
S. Shinkai
S. Zozor
T.-J. Wu
V.D. Gusev
W. Ebeling
W. Gersch
W. Gersch
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/12/2016
Field of study

Shannon entropy has been extensively used for characteriz- ing complexity of time series arising from chaotic dynamical systems and stochastic processes such as Markov chains. However, for short and noisy time series, Shannon entropy performs poorly. Complexity measures which are based on lossless compression algorithms are a good substitute in such scenarios. We evaluate the performance of two such Compression-Complexity Measures namely Lempel-Ziv complexity(LZ)andEffort-To-Compress( ETC)onshorttimeseriesfrom chaoticdynamicalsystemsinthepresenceofnoise.Both LZ and ETC outperform Shannon entropy (H) in accurately characterizing the dynamical complexity of such systems. For very short binary sequences (which arise in neuroscience applications), ETC has higher number of distinct complexity values than LZ and H, thus enabling a finer resolution. For two-state ergodic Markov chains, we empirically show that ETC converges to a steady state value faster than LZ. Compression-Complexity measures are promising for applications which involve short and noisy time series

arXiv.org e-Print Archive

NIAS Repository

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Using Document-Quality Measures to Predict Web-Search Effectiveness

Author: A. Shtok
G.V. Cormack
Y. Zhao
Publication venue
Publication date: 01/01/2013
Field of study

Abstract. The query-performance prediction task is estimating retrieval effectiveness in the absence of relevance judgments. The task becomes highly challenging over the Web due to, among other reasons, the effect of low quality (e.g., spam) documents on retrieval performance. To address this challenge, we present a novel prediction approach that utilizes queryindependent document-quality measures. While using these measures was shown to improve Web-retrieval effectiveness, this is the first study demonstrating the clear merits of using them for query-performance prediction. Evaluation performed with large scale Web collections shows that our methods post prediction quality that often surpasses that of state-of-the-art predictors, including those devised specifically for Web retrieval

CiteSeerX

Crossref

Type-dependent parameter inference

Author: Andrew K. Wright
Cormack G.V.
Girard J.
Gordon V. Cormack
Milner R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

A Global Search Architecture

Author: C.L.A. Clarke
F.J. Burkowski
G.V. Cormack
R.C. Good
Publication venue
Publication date
Field of study

Recent advances in communication and storage technology make available vast quantities of on-line information. But this information is of limited use unless it can be searched effectively. Huge scale and heterogeneity of data raise a unique combination of architectural issues that must be addressed to support effective search. These issues occasion the use of multi-user distributed search databases with the following capabilities: efficient structured searching of the contents of files having various schema; continuous availability in spite of failures and maintenance; highthroughput incorporation of a continuous stream of updates, especially the arrival new data and removal of obsolete data. We present an architecture that embodies solutions to specific technical problems that arise in the provision of these capabilities. The global data abstraction Until recently, the availability of useful information was constrained by the capacity of physical storage and communication media. Whil..

CiteSeerX

Is the First Query the Most Important: An Evaluation of Query Aggregation Schemes in Session Search

Author: B. Croft
C. Zhai
D. Metzler
G.V. Cormack
R.W. White
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

An Evidence-Based Verification Approach to Extract Entities and Relations for Knowledge Base Population

Author: C. Bizer
G.V. Cormack
N. Takhirov
P. Resnik
Q. Liu
R. Parundekar
S. Brin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref